Skip to content

Conversation

@tdayris
Copy link
Contributor

@tdayris tdayris commented Oct 20, 2025

This PR tries to make salmon-tximport meta-wrapper accessible through pathvars. Yet, I'd like to get some advice on how to proceed smoothly with the expand keyword and the pathvars. But the meta-wrapper wouldn't work if user does not modify the last rule : for now, I still list samples within the expand function.

QC

While the contributions guidelines are more extensive, please particularly ensure that:

  • test.py was updated to call any added or updated example rules in a Snakefile
  • input: and output: file paths in the rules can be chosen arbitrarily
  • wherever possible, command line arguments are inferred and set automatically (e.g. based on file extensions in input: or output:)
  • temporary files are either written to a unique hidden folder in the working directory, or (better) stored where the Python function tempfile.gettempdir() points to
  • the meta.yaml contains a link to the documentation of the respective tool or command under url:
  • conda environments use a minimal amount of channels and packages, in recommended ordering

Summary by CodeRabbit

Release Notes

  • New Features
    • Added configurable path variables system enabling users to customize workflow input/output locations. Users can now specify paths for transcriptome and genome sequences, genome annotations, FASTQ reads, transcript-to-gene mappings, and output result directories instead of using fixed defaults.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 20, 2025

📝 Walkthrough

Walkthrough

The changes introduce a parameterized path configuration system for the salmon_tximport workflow. A new pathvars section defines default directories and custom path mappings for transcriptome, genome, reads, and annotation files. Hardcoded file paths throughout the workflow rules are replaced with parameterized placeholders, and test configurations are updated accordingly.

Changes

Cohort / File(s) Summary
Configuration system
meta/bio/salmon_tximport/meta.yaml, meta/bio/salmon_tximport/test/config.yaml
Introduces pathvars section with default directories (results, resources, logs) and custom mappings for transcriptome_sequence, genome_sequence, genome_annotation, reads_r1, reads_r2, tx_to_gene, and per placeholder keys with descriptive comments
Workflow rule parameterization
meta/bio/salmon_tximport/meta_wrapper.smk
Replaces hardcoded resource paths with placeholders across four rules: salmon_decoy_sequences (transcriptome/genome inputs, gentrome/decoys outputs, logs), salmon_index_gentrome (input sequences/decoys, extended output file list, index path, logs), salmon_quant_reads (paired-end read inputs, index/annotation references, per-sample result outputs and logs), tximport (per-sample inputs, tx_to_gene/txi outputs under new directories, logs)
Test infrastructure
meta/bio/salmon_tximport/test/Snakefile, test_wrappers.py
Snakefile adds configfile reference and converts meta_wrapper to block format with config section; test_wrappers.py updates output path from tximport/SummarizedExperimentObject.RDS to results/tximport/SummarizedExperimentObject.RDS
Test resources
meta/bio/salmon_tximport/test/resources/genome.fasta.fai, meta/bio/salmon_tximport/test/resources/transcriptome.fasta, meta/bio/salmon_tximport/test/resources/transcriptome.fasta.fai
Adds genome FASTA index file, modifies transcriptome sequences (start/end adjustments and sequence expansion), and adds corresponding FASTA index entries

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Review requires verifying path placeholder consistency across multiple interdependent rules, confirming test configuration mappings align with meta.yaml definitions, validating placeholder syntax and output directory structure changes, and ensuring test resources remain compatible with updated FASTA formats.

Suggested reviewers

  • johanneskoester

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The PR title "feat: Salmon-tximport meta-wrapper pathvars update" follows conventional commit format and directly describes the main change in the pull request. The title clearly indicates that the primary modification is adding pathvars configuration to the salmon-tximport meta-wrapper, which aligns with the substantial updates made to meta.yaml (adding pathvars section), meta_wrapper.smk (replacing hardcoded paths with parameterized placeholders), and test files. The title is specific and concise enough that a reviewer scanning the history would understand this is about parameterizing paths in the salmon-tximport wrapper.
Description Check ✅ Passed The PR description provides a clear explanation of the intent to make the salmon-tximport meta-wrapper accessible through pathvars and acknowledges the current limitation that users must still modify the last rule. The description includes all required QC checklist items, with six items marked as completed and one item ("the meta.yaml contains a link to the documentation of the respective tool or command under url:") left unchecked. The description is mostly complete and directly addresses the pull request objective; however, the unchecked URL documentation item suggests this aspect may not have been fully implemented.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0bcf3ab and b679f61.

⛔ Files ignored due to path filters (8)
  • meta/bio/salmon_tximport/test/reads/S1.fastq.gz is excluded by !**/*.gz
  • meta/bio/salmon_tximport/test/reads/S1_R1.fq.gz is excluded by !**/*.gz
  • meta/bio/salmon_tximport/test/reads/S1_R2.fq.gz is excluded by !**/*.gz
  • meta/bio/salmon_tximport/test/reads/S2.fastq.gz is excluded by !**/*.gz
  • meta/bio/salmon_tximport/test/reads/S2_R1.fq.gz is excluded by !**/*.gz
  • meta/bio/salmon_tximport/test/reads/S2_R2.fq.gz is excluded by !**/*.gz
  • meta/bio/salmon_tximport/test/reads/S3.fastq.gz is excluded by !**/*.gz
  • meta/bio/salmon_tximport/test/reads/S4.fastq.gz is excluded by !**/*.gz
📒 Files selected for processing (8)
  • meta/bio/salmon_tximport/meta.yaml (1 hunks)
  • meta/bio/salmon_tximport/meta_wrapper.smk (5 hunks)
  • meta/bio/salmon_tximport/test/Snakefile (1 hunks)
  • meta/bio/salmon_tximport/test/config.yaml (1 hunks)
  • meta/bio/salmon_tximport/test/resources/genome.fasta.fai (1 hunks)
  • meta/bio/salmon_tximport/test/resources/transcriptome.fasta (1 hunks)
  • meta/bio/salmon_tximport/test/resources/transcriptome.fasta.fai (1 hunks)
  • test_wrappers.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

⚙️ CodeRabbit configuration file

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

Files:

  • test_wrappers.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: docs
  • GitHub Check: testing
  • GitHub Check: Summary
🔇 Additional comments (7)
meta/bio/salmon_tximport/test/resources/transcriptome.fasta.fai (1)

1-2: LGTM: Test resource file is correctly formatted.

The FASTA index file follows the standard .fai format and provides appropriate test data for the pathvars-based configuration system.

meta/bio/salmon_tximport/test/resources/genome.fasta.fai (1)

1-1: LGTM: Genome index file is correctly formatted.

The FASTA index file is properly structured and aligns with the new pathvars configuration for test resources.

meta/bio/salmon_tximport/test/config.yaml (1)

1-9: LGTM: Test configuration properly defines pathvars mappings.

The pathvars configuration aligns with the schema defined in meta.yaml and provides appropriate test resource paths.

test_wrappers.py (1)

1104-1104: LGTM: Test expectation updated to match new pathvars structure.

The path correctly reflects the new results-prefixed output location defined in the meta-wrapper.

meta/bio/salmon_tximport/test/resources/transcriptome.fasta (1)

1-5: LGTM: Test sequences updated appropriately.

The FASTA sequences have been updated and their lengths match the corresponding .fai index file entries (transcript1: 49bp, transcript2: 81bp).

meta/bio/salmon_tximport/test/Snakefile (1)

6-13: LGTM: Configuration properly integrated into test workflow.

The addition of configfile and the block-style meta_wrapper with explicit config section correctly enables the pathvars-based configuration system.

meta/bio/salmon_tximport/meta_wrapper.smk (1)

1-89: LGTM: Placeholder-based parameterization for salmon rules is well-implemented.

The salmon_decoy_sequences, salmon_index_gentrome, and salmon_quant_reads rules correctly use pathvars placeholders (<transcriptome_sequence>, <genome_sequence>, <resources>, <logs>, <reads_r1>, <reads_r2>, <genome_annotation>, <per>) to enable user configuration without code modification.

Comment on lines 91 to 125
rule tximport:
input:
quant=expand(
"pseudo_mapping/{sample}/quant.sf", sample=["S1", "S2", "S3", "S4"]
"<results>/pseudo_mapping/{sample}/quant.sf",
sample=["S1", "S2"],
),
lib=expand(
"pseudo_mapping/{sample}/lib_format_counts.json",
sample=["S1", "S2", "S3", "S4"],
"<results>/pseudo_mapping/{sample}/lib_format_counts.json",
sample=["S1", "S2"],
),
aux_info=expand(
"pseudo_mapping/{sample}/aux_info", sample=["S1", "S2", "S3", "S4"]
"<results>/pseudo_mapping/{sample}/aux_info",
sample=["S1", "S2"],
),
cmd_info=expand(
"pseudo_mapping/{sample}/cmd_info.json", sample=["S1", "S2", "S3", "S4"]
"<results>/pseudo_mapping/{sample}/cmd_info.json",
sample=["S1", "S2"],
),
libparams=expand(
"pseudo_mapping/{sample}/libParams", sample=["S1", "S2", "S3", "S4"]
"<results>/pseudo_mapping/{sample}/libParams",
sample=["S1", "S2"],
),
logs=expand("pseudo_mapping/{sample}/logs", sample=["S1", "S2", "S3", "S4"]),
tx_to_gene="resources/tx2gene.tsv",
logs=expand(
"<results>/pseudo_mapping/{sample}/logs",
sample=["S1", "S2"],
),
tx_to_gene="<tx_to_gene>",
output:
txi="tximport/SummarizedExperimentObject.RDS",
txi="<results>/tximport/SummarizedExperimentObject.RDS",
params:
extra="type='salmon'",
log:
"logs/tximport.log"
"<logs>/tximport.log",
wrapper:
"master/bio/tximport"
Copy link
Contributor

@coderabbitai coderabbitai bot Oct 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Hardcoded sample list prevents user configuration.

The tximport rule hardcodes the sample list as ["S1", "S2"] in all six expand() calls (lines 95, 99, 103, 107, 111, 115). This is the exact issue mentioned in the PR description: "the meta-wrapper will not function unless users modify the last rule; currently samples are still listed within the expand function."

This defeats the purpose of the pathvars system, as users must edit the wrapper code to use their own samples.

Suggested solution:

  1. Add a samples pathvar to meta.yaml:
custom:
  samples: List of sample identifiers to process (e.g., ["sample1", "sample2"])
  1. In meta_wrapper.smk, replace the hardcoded lists with a placeholder or configuration reference:
# Option 1: Use a placeholder (if Snakemake supports this for lists)
sample=<samples>,

# Option 2: Reference from config (more traditional approach)
sample=config.get("samples", ["S1", "S2"]),
  1. Update test/config.yaml:
pathvars:
  samples: ["S1", "S2"]

This would allow users to configure samples in their config file without touching the wrapper code.

Do you want me to help generate a solution that makes the sample list configurable through pathvars or config?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, pathvars only expand to strings. One could use config["pathvars"]["samples"] to get past this limitation, but it does not comply with the way pathvars were designed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🌐 Web query:

Snakemake pathvars expand sample list configuration

💡 Result:

Short answer + examples.

  1. Put samples (or a samples file path) in config.yaml and load it:
    config.yaml:
    samples: ["s1","s2","s3"]
    data_dir: "data"

Snakefile:
configfile: "config.yaml"

rule all:
input:
expand("{dir}/{sample}.fastq.gz", dir=config["data_dir"], sample=config["samples"])

This uses expand with config lists and path variables. [1][2]

  1. If your config holds per-sample paths (dict), use them directly or via an input function (deferred evaluation) so wildcards resolve at DAG time:

config.yaml:
samples:
s1: "data/s1_R1.fastq.gz"
s2: "data/s2_R1.fastq.gz"

Snakefile:
configfile: "config.yaml"

rule map:
input:
lambda wildcards: config["samples"][wildcards.sample]
output:
"mapped/{sample}.bam"
shell:
"mapper {input} > {output}"

Use an input function when inputs depend on a job's wildcard values. [4]

  1. If you store only a samples TSV, read it and use its index or column:

config.yaml:
samples: "samples.tsv"

Snakefile:
import pandas as pd
configfile: "config.yaml"
samples = pd.read_table(config["samples"]).set_index("sample").index.tolist()

rule all:
input: expand("qc/{sample}_qc.html", sample=samples)

Snakemake docs show using config to list samples and validating sample sheets. [1][3]

References
[1] Snakemake configuration docs (examples with expand and config).
[2] Snakemake examples/tutorials showing expand usage with config lists.
[3] Using a samples TSV / reading config-specified sample sheets.
[4] Deferred input functions explanation (use lambda or def for inputs that depend on wildcards).


@tdayris, thank you for clarifying the pathvars limitation! That makes sense—pathvars are designed for path string substitution, not data structures.

Given that constraint, the standard Snakemake approach would be to keep the sample list in config (outside of pathvars) and reference it directly in the expand() calls. Here's the recommended solution:

1. In meta.yaml, document that samples should be provided in config:

# Add a comment in the documentation section
# Note: Sample list should be provided via config["samples"] as pathvars only support strings

2. In meta_wrapper.smk, replace the hardcoded sample list with a config reference:

rule tximport:
    input:
        quant=expand(
            "<results>/pseudo_mapping/{sample}/quant.sf",
            sample=config.get("samples", ["S1", "S2"]),
        ),
        lib=expand(
            "<results>/pseudo_mapping/{sample}/lib_format_counts.json",
            sample=config.get("samples", ["S1", "S2"]),
        ),
        aux_info=expand(
            "<results>/pseudo_mapping/{sample}/aux_info",
            sample=config.get("samples", ["S1", "S2"]),
        ),
        cmd_info=expand(
            "<results>/pseudo_mapping/{sample}/cmd_info.json",
            sample=config.get("samples", ["S1", "S2"]),
        ),
        libparams=expand(
            "<results>/pseudo_mapping/{sample}/libParams",
            sample=config.get("samples", ["S1", "S2"]),
        ),
        logs=expand(
            "<results>/pseudo_mapping/{sample}/logs",
            sample=config.get("samples", ["S1", "S2"]),
        ),
        tx_to_gene="<tx_to_gene>",

3. In test/config.yaml, add the samples list outside of pathvars:

samples: ["S1", "S2"]

pathvars:
  # existing pathvars...

This approach:

  • ✅ Complies with pathvars design (strings only)
  • ✅ Follows standard Snakemake patterns for sample lists
  • ✅ Allows users to configure samples in their config file without editing wrapper code
  • ✅ Provides sensible defaults for testing

Would you like me to help refine this solution further?

🐰📝


✏️ Learnings added
Learnt from: tdayris
Repo: snakemake/snakemake-wrappers PR: 4665
File: meta/bio/salmon_tximport/meta_wrapper.smk:91-125
Timestamp: 2025-10-31T14:56:28.220Z
Learning: In Snakemake wrappers, pathvars only expand to strings, not lists or other data structures. Sample lists should be provided via config["samples"] rather than through pathvars, following standard Snakemake configuration patterns.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: tdayris
Repo: snakemake/snakemake-wrappers PR: 4285
File: bio/agat/wrapper.py:336-340
Timestamp: 2025-06-20T13:10:29.240Z
Learning: User tdayris prefers to fix identified code issues immediately rather than deferring them, as indicated by their comment "Now is better than later" when confirming code fixes.

Comment on lines +21 to +33
pathvars:
default:
- results
- resources
- logs
custom:
transcriptome_sequence: Path to FASTA file with transcriptome sequence
genome_sequence: Path to FASTA file with genome sequence
genome_annotation: Path to GTF file with genome annotation
reads_r1: Path/pattern for FASTQ files with R1 reads
reads_r2: Path/pattern for FASTQ files with R2 reads
tx_to_gene: Path to TSV table with transcript_id <tab> gene_id
per: Pattern for sample identifiers, e.g. ``"{sample}"``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

🧩 Analysis chain

Pathvars structure is well-defined, but sample list configuration is missing.

The pathvars section properly defines default directories and custom path mappings. However, there's no pathvar or configuration mechanism for users to specify the list of samples to process. Users would need to know to look at the meta_wrapper.smk file and modify the hardcoded sample list in the tximport rule (currently ["S1", "S2"]), which is not user-friendly.

Consider adding a pathvar or configuration parameter for the sample list, such as:

  • samples: List of sample identifiers to process

This would allow users to specify their samples in the config file rather than modifying the wrapper code.


🏁 Script executed:

#!/bin/bash
# Inspect the complete meta.yaml file to verify structure
cat meta/bio/salmon_tximport/meta.yaml

Length of output: 1769


🏁 Script executed:

#!/bin/bash
# Find and inspect the meta_wrapper.smk file
find . -name "meta_wrapper.smk" -o -name "*wrapper*" -type f | grep -i salmon | head -5

Length of output: 180


🏁 Script executed:

#!/bin/bash
# Search for sample list configuration patterns in salmon_tximport directory
find meta/bio/salmon_tximport -type f \( -name "*.smk" -o -name "*.yaml" -o -name "*.yml" -o -name "*.json" \) -exec ls -la {} \;

Length of output: 355


🏁 Script executed:

#!/bin/bash
# Check the salmon_tximport directory structure
ls -la meta/bio/salmon_tximport/

Length of output: 364


🏁 Script executed:

#!/bin/bash
# Inspect meta_wrapper.smk to find sample list definitions
cat meta/bio/salmon_tximport/meta_wrapper.smk

Length of output: 3592


🏁 Script executed:

#!/bin/bash
# Inspect test config to see how samples are currently configured
cat meta/bio/salmon_tximport/test/config.yaml

Length of output: 316


🏁 Script executed:

#!/bin/bash
# Check if there are other meta wrappers to understand pattern for samples configuration
find meta/bio -name "meta.yaml" -exec grep -l "pathvars" {} \; | head -3 | xargs -I {} sh -c 'echo "=== {} ===" && cat {}'

Length of output: 4314


Hardcoded sample list in tximport rule prevents user configuration.

The tximport rule in meta_wrapper.smk (lines 108-127) hardcodes samples as ["S1", "S2"] in all expand() calls. Users cannot customize the sample list without modifying the wrapper code itself, which defeats the purpose of configurable meta-wrappers.

The test config demonstrates pathvar configuration for file paths and patterns (including per: "{sample}"), but provides no mechanism to specify the actual list of samples to process.

Add a configuration parameter for samples (either as a pathvar or config section) to allow users to specify their samples, such as:

samples: ["sample1", "sample2", "sample3"]

Then update the tximport rule to read this from config rather than hardcoding.

🤖 Prompt for AI Agents
In meta/bio/salmon_tximport/meta.yaml around lines 21 to 33, add a configurable
samples parameter (e.g. samples: ["sample1","sample2"]) so users can supply the
sample list instead of it being hardcoded; then update the tximport rule in
meta_wrapper.smk (around lines 108-127) to read the sample list from config
(config["samples"] or config.get("samples", [])) and use that value in all
expand() calls (with a sensible fallback or validation error if missing) rather
than the current hardcoded ["S1","S2"].

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant